median_absolute_error (MedAE)#
Median Absolute Error (MedAE) is a regression metric that summarizes the typical absolute prediction error by taking the median of absolute residuals.
Compared to MAE (mean absolute error), MedAE is much more robust to outliers: as long as fewer than 50% of samples are extreme, the median barely moves.
Learning goals#
Define MedAE precisely (math + units)
Build intuition with Plotly visuals (median vs mean, outliers)
Implement MedAE from scratch in NumPy (including multi-output)
Use MedAE as an optimization objective for a simple model (derivative-free search)
Know pros/cons and when to use it
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from sklearn.metrics import mean_absolute_error, mean_squared_error, median_absolute_error
pio.templates.default = 'plotly_white'
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
1) Definition#
Let \(y \in \mathbb{R}^n\) be targets and \(\hat{y} \in \mathbb{R}^n\) predictions.
Define absolute errors:
Then the median absolute error is:
If we sort the absolute errors \(a_{(1)} \le \dots \le a_{(n)}\):
if \(n\) is odd, \(\operatorname{median}(a)=a_{((n+1)/2)}\)
if \(n\) is even, \(\operatorname{median}(a)=\frac{a_{(n/2)} + a_{(n/2+1)}}{2}\) (NumPy / scikit-learn convention)
In quantile notation:
Multi-output targets#
If \(y \in \mathbb{R}^{n \times m}\) (m outputs), compute MedAE per output:
and then aggregate across outputs (uniform average or custom weights).
Units#
MedAE has the same units as the target \(y\) (e.g. dollars, °C).
# A tiny example with an extreme outlier
y_true = np.array([2.0, 0.0, 4.0, 1.0, 100.0])
y_pred = np.array([1.5, 0.2, 3.0, 2.0, 0.0])
abs_errors = np.abs(y_true - y_pred)
medae_np = float(np.median(abs_errors))
medae_sklearn = median_absolute_error(y_true, y_pred)
print('y_true =', y_true)
print('y_pred =', y_pred)
print('|error| =', abs_errors)
print('MedAE (NumPy) =', medae_np)
print('MedAE (sklearn) =', medae_sklearn)
print('MAE (compare) =', mean_absolute_error(y_true, y_pred))
print('RMSE (compare) =', np.sqrt(mean_squared_error(y_true, y_pred)))
y_true = [ 2. 0. 4. 1. 100.]
y_pred = [1.5 0.2 3. 2. 0. ]
|error| = [ 0.5 0.2 1. 1. 100. ]
MedAE (NumPy) = 1.0
MedAE (sklearn) = 1.0
MAE (compare) = 20.54
RMSE (compare) = 44.726479852543726
idx = np.arange(len(y_true))
mean_abs = float(np.mean(abs_errors))
fig = go.Figure()
fig.add_trace(go.Bar(x=idx, y=abs_errors, name='|y - ŷ|'))
fig.add_hline(
y=medae_np,
line_dash='dash',
line_color='black',
annotation_text=f'median={medae_np:.2f}',
)
fig.add_hline(
y=mean_abs,
line_dash='dot',
line_color='crimson',
annotation_text=f'mean={mean_abs:.2f}',
)
fig.update_layout(
title='Absolute errors: median vs mean',
xaxis_title='sample index',
yaxis_title='|error|',
)
fig.show()
2) Intuition: “typical” error (the 50th percentile)#
MedAE does not average errors. It asks:
“What is the error magnitude that half of my predictions beat?”
Equivalently, MedAE is the 50th percentile (median) of the absolute-error distribution.
If a few predictions are catastrophically wrong, they usually land in the upper tail of \(|\mathrm{error}|\).
The median ignores that tail unless it becomes more than half the data.
n = 400
y_true = rng.normal(0, 1, size=n)
y_pred = y_true + rng.normal(0, 0.4, size=n)
# Inject a few extreme misses
out_idx = rng.choice(n, size=8, replace=False)
y_pred[out_idx] += rng.normal(0, 10.0, size=out_idx.size)
abs_errors = np.abs(y_true - y_pred)
med = float(np.median(abs_errors))
mean_ = float(np.mean(abs_errors))
fig = px.histogram(abs_errors, nbins=40, title='Distribution of |error|')
fig.add_vline(x=med, line_dash='dash', line_color='black', annotation_text=f'median={med:.2f}')
fig.add_vline(x=mean_, line_dash='dot', line_color='crimson', annotation_text=f'mean={mean_:.2f}')
fig.update_layout(xaxis_title='|error|', yaxis_title='count')
fig.show()
3) Outliers: why MedAE is robust#
Because the median has a 50% breakdown point, MedAE can tolerate large outliers as long as they affect < 50% of samples:
With 10% bad points, making those points 10× worse barely changes MedAE.
MAE and RMSE do change because they average (and RMSE squares) the tail.
Let’s simulate that.
n = 300
y_true = rng.normal(0, 1, size=n)
y_pred_base = y_true + rng.normal(0, 0.3, size=n)
outlier_frac = 0.10
k = int(outlier_frac * n)
out_idx = rng.choice(n, size=k, replace=False)
signs = rng.choice([-1.0, 1.0], size=k)
magnitudes = np.linspace(0, 25, 60)
medae_vals = []
mae_vals = []
rmse_vals = []
for m in magnitudes:
y_pred = y_pred_base.copy()
y_pred[out_idx] += signs * m
medae_vals.append(float(np.median(np.abs(y_true - y_pred))))
mae_vals.append(mean_absolute_error(y_true, y_pred))
rmse_vals.append(np.sqrt(mean_squared_error(y_true, y_pred)))
fig = go.Figure()
fig.add_trace(go.Scatter(x=magnitudes, y=medae_vals, mode='lines', name='MedAE (median)'))
fig.add_trace(go.Scatter(x=magnitudes, y=mae_vals, mode='lines', name='MAE (mean)'))
fig.add_trace(go.Scatter(x=magnitudes, y=rmse_vals, mode='lines', name='RMSE (sqrt MSE)'))
fig.update_layout(
title=f'Effect of {outlier_frac:.0%} extreme outliers on common metrics',
xaxis_title='outlier magnitude added to predictions',
yaxis_title='metric value (same units as y)',
)
fig.show()
def medae_with_fraction_outliers(frac, *, seed_offset=0):
local_rng = np.random.default_rng(123 + seed_offset)
k = int(frac * n)
idx = local_rng.choice(n, size=k, replace=False)
signs = local_rng.choice([-1.0, 1.0], size=k)
vals = []
for m in magnitudes:
y_pred = y_pred_base.copy()
y_pred[idx] += signs * m
vals.append(float(np.median(np.abs(y_true - y_pred))))
return vals
medae_10 = medae_with_fraction_outliers(0.10, seed_offset=0)
medae_60 = medae_with_fraction_outliers(0.60, seed_offset=1)
fig = go.Figure()
fig.add_trace(go.Scatter(x=magnitudes, y=medae_10, mode='lines', name='MedAE (10% outliers)'))
fig.add_trace(go.Scatter(x=magnitudes, y=medae_60, mode='lines', name='MedAE (60% outliers)'))
fig.update_layout(
title='Median breakdown: once >50% are outliers, MedAE moves too',
xaxis_title='outlier magnitude added to predictions',
yaxis_title='MedAE',
)
fig.show()
4) NumPy implementation (from scratch)#
scikit-learn’s median_absolute_error supports:
1D targets:
y.shape == (n_samples,)multi-output targets:
y.shape == (n_samples, n_outputs)aggregation via
multioutput:'raw_values'→ one MedAE per output'uniform_average'→ average over outputs (default)array-like weights → weighted average over outputs
Unlike MAE/MSE, MedAE does not accept sample_weight in scikit-learn (there is no universally-agreed “weighted median” convention).
def _as_2d(y):
y = np.asarray(y)
if y.ndim == 1:
return y.reshape(-1, 1)
if y.ndim == 2:
return y
raise ValueError(f'y must be 1D or 2D, got shape={y.shape}')
def median_absolute_error_np(y_true, y_pred, *, multioutput='uniform_average'):
"""NumPy implementation compatible with sklearn.metrics.median_absolute_error.
Parameters
----------
y_true, y_pred : array-like of shape (n_samples,) or (n_samples, n_outputs)
multioutput : {'raw_values', 'uniform_average'} or array-like of shape (n_outputs,)
Returns
-------
float or np.ndarray
"""
y_true_2d = _as_2d(y_true)
y_pred_2d = _as_2d(y_pred)
if y_true_2d.shape != y_pred_2d.shape:
raise ValueError(
f'y_true and y_pred must have the same shape, got {y_true_2d.shape} vs {y_pred_2d.shape}'
)
abs_errors = np.abs(y_true_2d - y_pred_2d)
med_per_output = np.median(abs_errors, axis=0)
if isinstance(multioutput, str):
if multioutput == 'raw_values':
return med_per_output
if multioutput == 'uniform_average':
return float(np.mean(med_per_output))
raise ValueError(
"multioutput must be 'raw_values', 'uniform_average', or an array of output weights"
)
weights = np.asarray(multioutput, dtype=float)
if weights.shape != med_per_output.shape:
raise ValueError(
f'multioutput weights must have shape {med_per_output.shape}, got {weights.shape}'
)
return float(np.average(med_per_output, weights=weights))
# 1D check
y_true = rng.normal(size=50)
y_pred = y_true + rng.normal(scale=0.3, size=50)
print('MedAE (np) =', median_absolute_error_np(y_true, y_pred))
print('MedAE (sklearn)=', median_absolute_error(y_true, y_pred))
# Multi-output check
y_true = rng.normal(size=(80, 3))
y_pred = y_true + rng.normal(scale=0.5, size=(80, 3))
print('\nraw_values (np) =', median_absolute_error_np(y_true, y_pred, multioutput='raw_values'))
print('raw_values (sklearn)=', median_absolute_error(y_true, y_pred, multioutput='raw_values'))
weights = np.array([0.2, 0.3, 0.5])
print('\nweighted (np) =', median_absolute_error_np(y_true, y_pred, multioutput=weights))
print('weighted (sklearn)=', median_absolute_error(y_true, y_pred, multioutput=weights))
MedAE (np) = 0.22811342804441603
MedAE (sklearn)= 0.22811342804441603
raw_values (np) = [0.3048 0.3299 0.3327]
raw_values (sklearn)= [0.3048 0.3299 0.3327]
weighted (np) = 0.3263042693906659
weighted (sklearn)= 0.3263042693906659
5) Using MedAE to optimize a model#
MedAE is great as an evaluation metric, but it’s awkward as a training loss:
it uses a median (an order statistic)
it is non-smooth and has plateaus
for many models, the objective is not convex
Still, you can optimize it in low-level ways (typically derivative-free). We’ll do two simple cases:
the best constant predictor under MedAE (closed form)
fitting a line by directly minimizing MedAE with a randomized search
5.1 Constant predictor: minimize MedAE(y, c)#
Consider the constant model \(\hat{y}_i = c\) for all \(i\).
The objective is:
Interpretation in 1D:
Let \(r \ge 0\).
The condition \(J(c) \le r\) means at least half of the points lie inside the interval \([c-r,\, c+r]\).
So minimizing \(J(c)\) is equivalent to:
Find the shortest interval that contains at least half the samples, then place \(c\) in the middle of that interval.
Algorithm (odd \(n\); NumPy/scikit-learn’s median uses the same idea for even \(n\)):
Sort \(y_{(1)} \le \dots \le y_{(n)}\)
Let \(k = \lceil n/2 \rceil\)
Slide a window of size \(k\) and compute widths: $\( w_i = y_{(i+k-1)} - y_{(i)}, \quad i = 1,\dots,n-k+1 \)$
Pick the tightest window \(i^* = \arg\min_i w_i\)
Set: $\( c^* = \frac{y_{(i^*)} + y_{(i^*+k-1)}}{2} \)$
This is different from MAE: minimizing MAE over constants yields the (usual) median of \(y\), while minimizing MedAE finds the center of the densest half of the data (a “mode-like” behavior).
def best_constant_medae(y):
"""Return (c_star, MedAE(y, c_star)) for the constant predictor y_hat=c.
Uses the shortest half-sample interval (window of size ceil(n/2)).
"""
y = np.asarray(y, dtype=float).ravel()
if y.size == 0:
raise ValueError('y must be non-empty')
y_sorted = np.sort(y)
n = y_sorted.size
k = n // 2 + 1 # ceil(n/2)
widths = y_sorted[k - 1 :] - y_sorted[: n - k + 1]
i = int(np.argmin(widths))
c_star = 0.5 * (y_sorted[i] + y_sorted[i + k - 1])
medae_star = float(np.median(np.abs(y - c_star)))
return c_star, medae_star
# A two-cluster example (dense cluster on the right)
y = np.array([0, 1, 2, 3, 100, 101, 102, 103, 104], dtype=float)
c_median = float(np.median(y)) # minimizer of MAE over constants
c_star, medae_star = best_constant_medae(y)
c_grid = np.linspace(y.min() - 5, y.max() + 5, 800)
loss = np.array([float(np.median(np.abs(y - c))) for c in c_grid])
print(f'median(y) (MAE-optimal constant) = {c_median:.2f}')
print(f'c* (MedAE-optimal constant) = {c_star:.2f}')
print(f'MedAE(y, c*) = {medae_star:.2f}')
print(f'MedAE(y, median(y)) = {float(np.median(np.abs(y - c_median))):.2f}')
fig = go.Figure()
fig.add_trace(go.Scatter(x=c_grid, y=loss, mode='lines', name='MedAE(y, c)'))
fig.add_vline(
x=c_median,
line_dash='dot',
line_color='gray',
annotation_text=f'median(y)={c_median:.1f}',
)
fig.add_vline(
x=c_star,
line_dash='dash',
line_color='black',
annotation_text=f'c*={c_star:.1f}',
)
fig.update_layout(
title='Constant predictor: minimizing MedAE targets the densest half of y',
xaxis_title='constant prediction c',
yaxis_title='MedAE(y, c)',
)
fig.show()
median(y) (MAE-optimal constant) = 100.00
c* (MedAE-optimal constant) = 102.00
MedAE(y, c*) = 2.00
MedAE(y, median(y)) = 4.00
5.2 Fitting a line by directly minimizing MedAE#
Now consider a simple linear model:
If we train by MedAE:
This surface is typically non-smooth (lots of flat regions), so we’ll use a very simple randomized hill-climbing optimizer:
start from an initial guess (we’ll use OLS as a warm start)
randomly propose small changes to \((b_0, b_1)\)
keep the proposal only if it reduces MedAE
slowly shrink the proposal scale
This isn’t production-grade optimization, but it shows how the metric can be used as a direct objective.
n = 180
x = rng.uniform(-2, 2, size=n)
b0_true, b1_true = 1.0, 2.0
y_clean = b0_true + b1_true * x + rng.normal(0, 0.3, size=n)
y = y_clean.copy()
n_out = int(0.15 * n)
out_idx = rng.choice(n, size=n_out, replace=False)
y[out_idx] += rng.normal(0, 8.0, size=n_out)
X = np.column_stack([np.ones_like(x), x])
b0_ols, b1_ols = np.linalg.lstsq(X, y, rcond=None)[0]
print(f'true params: b0={b0_true:.3f}, b1={b1_true:.3f}')
print(f'OLS params: b0={b0_ols:.3f}, b1={b1_ols:.3f}')
is_outlier = np.zeros(n, dtype=bool)
is_outlier[out_idx] = True
colors = np.where(is_outlier, 'crimson', 'royalblue')
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=x,
y=y,
mode='markers',
name='observations',
marker=dict(color=colors, size=7, opacity=0.85),
)
)
fig.update_layout(title='Synthetic regression data with outliers', xaxis_title='x', yaxis_title='y')
fig.show()
true params: b0=1.000, b1=2.000
OLS params: b0=1.270, b1=2.012
def medae_for_line(x, y, b0, b1):
y_hat = b0 + b1 * x
return float(np.median(np.abs(y - y_hat)))
def fit_line_medae_hillclimb(
x,
y,
*,
b0_init=0.0,
b1_init=0.0,
n_steps=8000,
step0=1.5,
decay=0.9995,
):
b0, b1 = float(b0_init), float(b1_init)
best = medae_for_line(x, y, b0, b1)
step_b0 = float(step0)
step_b1 = float(step0)
history = {
'step': [],
'b0': [],
'b1': [],
'medae': [],
}
for t in range(n_steps):
cand_b0 = b0 + step_b0 * rng.normal()
cand_b1 = b1 + step_b1 * rng.normal()
cand = medae_for_line(x, y, cand_b0, cand_b1)
if cand < best:
b0, b1, best = cand_b0, cand_b1, cand
history['step'].append(t)
history['b0'].append(b0)
history['b1'].append(b1)
history['medae'].append(best)
step_b0 *= decay
step_b1 *= decay
return (b0, b1), history
(b0_medae, b1_medae), hist = fit_line_medae_hillclimb(
x,
y,
b0_init=b0_ols,
b1_init=b1_ols,
n_steps=8000,
step0=1.5,
decay=0.9995,
)
def summarize(b0, b1):
y_hat = b0 + b1 * x
return {
'MedAE': float(np.median(np.abs(y - y_hat))),
'MAE': mean_absolute_error(y, y_hat),
'RMSE': np.sqrt(mean_squared_error(y, y_hat)),
}
print('OLS metrics:', summarize(b0_ols, b1_ols))
print('MedAE-optimized metrics:', summarize(b0_medae, b1_medae))
fig = px.line(
y=hist['medae'],
title='Hill-climbing: best MedAE so far',
labels={'index': 'step', 'y': 'best MedAE'},
)
fig.show()
x_line = np.linspace(x.min(), x.max(), 200)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode='markers', name='data', marker=dict(color=colors, size=7, opacity=0.85)))
fig.add_trace(
go.Scatter(
x=x_line,
y=b0_ols + b1_ols * x_line,
mode='lines',
name='OLS (min MSE)',
line=dict(color='black', dash='dot'),
)
)
fig.add_trace(
go.Scatter(
x=x_line,
y=b0_medae + b1_medae * x_line,
mode='lines',
name='Min MedAE (hill-climb)',
line=dict(color='seagreen'),
)
)
fig.update_layout(title='Line fit: OLS vs directly minimizing MedAE', xaxis_title='x', yaxis_title='y')
fig.show()
OLS metrics: {'MedAE': 0.3255882359758968, 'MAE': 1.2562381006417402, 'RMSE': 3.206246485594046}
MedAE-optimized metrics: {'MedAE': 0.23791194121513426, 'MAE': 1.198741198055658, 'RMSE': 3.212982820690415}
# Visualize the MedAE surface over (b0, b1) with the (downsampled) search path
b0_grid = np.linspace(b0_ols - 4.0, b0_ols + 4.0, 160)
b1_grid = np.linspace(b1_ols - 4.0, b1_ols + 4.0, 160)
B0, B1 = np.meshgrid(b0_grid, b1_grid)
residuals = y[None, None, :] - (B0[:, :, None] + B1[:, :, None] * x[None, None, :])
Z = np.median(np.abs(residuals), axis=2)
stride = max(1, len(hist['step']) // 150)
b0_path = np.array(hist['b0'])[::stride]
b1_path = np.array(hist['b1'])[::stride]
fig = go.Figure()
fig.add_trace(
go.Contour(
x=b0_grid,
y=b1_grid,
z=Z,
contours_coloring='heatmap',
colorbar=dict(title='MedAE'),
)
)
fig.add_trace(go.Scatter(x=b0_path, y=b1_path, mode='lines', name='search path', line=dict(color='white')))
fig.add_trace(
go.Scatter(
x=[b0_ols],
y=[b1_ols],
mode='markers',
name='OLS start',
marker=dict(color='black', symbol='x', size=10),
)
)
fig.add_trace(
go.Scatter(
x=[b0_medae],
y=[b1_medae],
mode='markers',
name='best found',
marker=dict(color='lime', size=10),
)
)
fig.update_layout(
title='MedAE surface over (b0, b1): non-smooth & plateau-heavy',
xaxis_title='b0 (intercept)',
yaxis_title='b1 (slope)',
)
fig.show()
6) Pros, cons, and when to use MedAE#
Pros#
Robust to outliers: the median ignores extreme errors until they affect >50% of samples.
Interpretable units: same unit as the target.
Measures typical performance (50th percentile), which can match user-facing targets (“median miss”).
Cons / pitfalls#
Hides tail risk: two models can have the same MedAE while one has much worse rare failures.
Hard to optimize directly: the median makes the objective non-smooth with plateaus; gradient methods don’t apply cleanly.
No sample weights in sklearn: “weighted median” is ambiguous and not provided.
Can be unstable on small datasets (the median jumps when a single point crosses the middle).
Good use cases#
Regression with heavy-tailed noise / occasional gross outliers (sensor glitches, messy labels).
When “typical error” matters more than worst-case or average (e.g., median ETA error).
As a validation metric alongside MAE/RMSE/max error to capture multiple aspects of error.
7) Exercises#
Create a dataset where 40% of points are extreme outliers. Verify MedAE barely moves, then repeat with 60% and observe the breakdown.
Implement a weighted median absolute error (choose a clear weighted-median convention) and test it on toy data.
Compare OLS vs MedAE-optimized line on data with asymmetric noise (one-sided outliers). Which metric better matches what you care about?
References#
scikit-learn docs: https://scikit-learn.org/stable/modules/model_evaluation.html#median-absolute-error
Robust statistics (median, breakdown point): https://en.wikipedia.org/wiki/Robust_statistics
Rousseeuw, Leroy — Robust Regression and Outlier Detection (least-median ideas)